Mon. Not. R. Astron. Soc. 000, 000-000 (0000) Printed 2 February 2008 (MN MfeK style file vl.4) 



Galaxy Occupation Statistics of Dark Matter Haloes: 
Observational Results 



Xiaohu Yang 1 , H.J. Mo 1 , Y.P. Jing 2 , Frank C. van den Bosch 3 

1 Department of Astronomy, University of Massachusetts, Amherst MA 01003-9305, USA 

2 Shanghai Astronomical Observatory; the Partner Group of MP A, Nandan Road 80, Shanghai 200030, China 

3 Department of Physics, Swiss Federal Institute of Technology, ETH Honggerberg, CH-8093, Zurich, Switzerland 



ABSTRACT 

We study the occupation statistics of galaxies in dark matter haloes using galaxy 
groups identified from the 2-degree Field Galaxy Redshift Survey with the halo-based 
group finder of Yang et al. (2004b). The occupation distribution is considered sep- 
arately for early and late type galaxies, as well as in terms of central and satellite 
galaxies. The mean luminosity of the central galaxies scales with halo mass approxi- 
mately as L c oc M 2 I Z for haloes with masses M < 10 13 /i -1 M , and as L c cx M 1 / 4 
for more massive haloes. The characteristic mass of 10 13 ft. _1 M is consistent with 
the mass scale where galaxy formation models suggest a transition from efficient to 
inefficient cooling. Another characteristic halo mass scale, M ~ 10 n /i -1 M Q , which 
cannot be probed directly by our groups, is inferred from the conditional luminosity 
function (CLF) that matches the observed galaxy luminosity function and clustering. 
For a halo of given mass, the distribution of L c is rather narrow. Detailed comparison 
with mock galaxy redshift surveys indicates this implies a fairly deterministic relation 
between L c and halo mass. The satellite galaxies, however, are found to follow a Pois- 
sonian number distribution, in excellent agreement with the occupation statistics of 
dark matter subhaloes. This provides strong support for the standard lore that satel- 
lite galaxies reside in subhaloes. The central galaxies in low- mass haloes are mostly 
late type galaxies, while those in massive haloes are almost all early types. We also 
measure the CLF of galaxies in haloes of given mass. Over the mass range that can 
be reliably probed with the present data (13.3 <> log^f/^" 1 M Q )] <> 14.7), the CLF 
is reasonably well fit by a Schechter function. Contrary to recent claims based on 
semi-analytical models of galaxy formation, the presence of central galaxies does not 
show up as a strong peak at the bright end of the CLF. The CLFs obtained from the 
observational data are in good agreement with the CLF model obtained by matching 
the observed luminosity function and large-scale clustering properties of galaxies in 
the standard ACDM model. 

Key words: dark matter - large-scale structure of the universe - galaxies: haloes - 
methods: statistical 



1 INTRODUCTION 

According to the current paradigm of structure formation, 
galaxies form and reside inside extended cold dark mat- 
ter (CDM) haloes. These haloes are virialized clumps that 
formed through the gravitational instability of the cosmic 
density field, and have typical sizes that are much smaller 
than their mean spatial separation. One of the ultimate chal- 
lenges in astrophysics is to obtain a detailed understand- 
ing of how galaxies with different physical properties oc- 
cupy dark matter haloes of different mass. This link between 
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galaxies and dark matter haloes is an imprint of various 
complicated physical processes related to galaxy formation, 
such as gravitational instability, gas cooling, star formation, 
merging, tidal stripping and heating, and a variety of feed- 
back processes. A detailed quantification of this link is there- 
fore pivotal for our understanding of galaxy formation and 
evolution within the CDM cosmogony. Although the statis- 
tical link itself does not give a physical explanation of how 
galaxies form and evolve, it provides important constraints 
on these processes and on how their efficiencies scale with 
halo mass. 

To quantify the relationship between haloes and galax- 
ies in a statistical way, it has become customary to specify 
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the so-called halo occupation distribution, P(N\M), which 
gives the probability to find N galaxies (with some speci- 
fied properties) in a halo of mass M. This occupation dis- 
tribution can be constrained using data on the clustering 
properties of galaxies, as it completely specifies the galaxy 
bias, and has been used extensively to study the galaxy dis- 
tribution in dark matter haloes and the galaxy clustering 
on large scales (Jing, Mo & Borner 1998; Peacock & Smith 
2000; Seljak 2000; Scoccimarro et al. 2001; Jing, Borner & 
Suto 2002; Berlind & Weinberg 2002; Bullock, Wechsler & 
Somerville 2002; Scranton 2002; Kang et al. 2002; Marinoni 
& Hudson 2002; Zheng et al. 2002; Magliocchetti & Por- 
ciani 2003; Berlind et al. 2003; Zehavi et al. 2004a,b; Zheng 
et al. 2004). 

Since individual galaxies are not featureless objects, 
but have diverse intrinsic properties, a more useful halo 
occupation model should contain some information regard- 
ing the physical properties of the galaxies. A significant 
step in this direction has been taken by Yang, Mo & 
van den Bosch (2003b) and van den Bosch, Yang & Mo 
(2003a), who modelled the halo occupation as a function 
of both galaxy luminosity and type (see also Vale & Os- 
triker 2004 for a somewhat different approach) . In particular, 
they introduced the conditional luminosity function (CLF), 
$(L|M)dL, which gives the average number of galaxies with 
luminosity L ± dL/2 that reside in a halo of mass M. As 
shown by Yang et al. (2003b), once the galaxy luminosity 
function and the galaxy correlation amplitude as a func- 
tion of luminosity are known, tight constraints on $(L|M) 
can be obtained. Detailed comparisons with additional data 
from the 2dFGRS, the Sloan Digital Sky Survey (SDSS) and 
DEEP2 have shown that the resulting halo occupation mod- 
els can reproduce a large number of observations regarding 
the galaxy distribution at low redshift (Yan, Madgwick & 
White 2003; Yang et al. 2004a; Mo et al. 2004; Wang et 
al. 2004; Zehavi et al. 2004b; Yan, White & Coil 2004). This 
not only implies that these occupation distributions provide 
a reliable description of the connection between galaxies and 
CDM haloes, it also implies that the standard ACDM model 
is a good approximation to the real Universe. After all, the 
abundances and clustering properties of dark matter haloes 
are cosmology dependent, and matching the data with oc- 
cupation models is only possible for a restricted set of cos- 
mological parameters (Zheng et al. 2002; van den Bosch et 
al. 2003b; Rozo, Dodelson & Frieman 2004; Abazajian et 
al. 2004). 

An important shortcoming of these occupation models, 
however, is that the results are not completely model inde- 
pendent. Typically assumptions have to be made regarding 
the functional form of either P(N\M) or $(L|M). For ex- 
ample, in our work on the CLF we have always assumed 
that it is well described by a Schechter function (Yang et 
al. 2003b; van den Bosch et al. 2003a, 2004b). Recently, 
however, the validity of this assumption was questioned by 
Zheng et al. (2004), based on a study of the conditional 
baryonic mass function (similar to the CLF but with lumi- 
nosity replaced by baryonic mass) in semi-analytical models 
of galaxy formation. Note that in all halo occupation stud- 
ies to date, the occupation distributions have been deter- 
mined in an indirect way: the free parameters of the assumed 
functional form are constrained using statistical data on the 
abundance and clustering properties of the galaxy popula- 



tion. Ideally, however, one would determine the occupation 
distribution more directly, by using a method that can de- 
termine which galaxies belong to the same dark matter halo. 
If such a method can be found, the occupation statistics, in- 
cluding the CLF, can be obtained directly from the data 
without the need to make any assumptions. 

In this paper we perform such a direct determination of 
the occupation statistics using the halo-based group finder 
developed by Yang et al. (2004b). Detailed tests with mock 
galaxy catalogues have shown that this group finder is very 
successful in associating galaxies according to their common 
dark matter haloes (Yang et al. 2004b, c). In particular, the 
group finder performs reliably not only for rich systems, but 
also for poor systems, including isolated central galaxies in 
low-mass haloes, making it possible to study the galaxy- 
halo connection for a wide range of different systems. In 
this paper, we use the sample of galaxy groups obtained 
from the 2dFGRS with this group finder to study the galaxy 
occupation statistics in dark matter haloes as a function of 
halo mass, galaxy luminosity and type, and in terms of both 
central and satellite galaxies. The arrangement of the paper 
is as follows: In Section 2 we describe the data and the mock 
surveys used in the present paper. Sections 3 and 4 presents 
our results on the halo occupation distribution and on the 
CLF. Further discussion and a summary of our results are 
given in Section 5. 



2 GROUP CATALOGUES 

2.1 Galaxy Groups in the 2dFGRS 

Here we briefly describe the group catalogues used in the 
analyzes that follow. The construction of these catalogues, 
as well as numerous tests regarding the performance of the 
group finder, is described in Yang et al. (2004b, hereafter 
YMBJ ) to which we refer the interested reader for details. 

The basic idea behind the group finder developed 
by YMBJ is similar to that of the matched filter algo- 
rithm developed by Postman et al. (1996) (see also Kep- 
ner et al. 1999; White & Kochanek 2002; Kim et al. 2002; 
Kochanek et al. 2003; van den Bosch et al. 2004a,b), al- 
though it also makes use of the galaxy kinematics. The group 
finder starts with an assumed mass-to-light ratio to assign 
a tentative mass to each potential group (identified using 
the Friends-of- Friends (FOF) method) . This mass is used to 
estimate the size and velocity dispersion of the underlying 
halo that hosts the group, which in turn is used to deter- 
mine group membership (in redshift space). This procedure 
is iterated until no further changes occur in group member- 
ships. The performance of the group finder was tested in 
terms of completeness of true members and contamination 
by interlopers, using detailed mock galaxy redshift surveys. 
The average completeness of individual groups is ~ 90 per- 
cent and with only ~ 20 percent interlopers. Furthermore, 
the resulting group catalogue is insensitive to the initial as- 
sumption regarding the mass-to-light ratios, and the group 
finder is more successful than the conventional FOF method 
(e.g., Eke et al. 2004 and references therein) in associating 
galaxies according to their common dark matter haloes. 

In YMBJ we used this group finder to identify galaxy 
groups in the final public data release of the 2dFGRS. This 
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Figure 1. The left-hand panel shows the relation between the group luminosity Lig and the mean separation d of all groups with a 
group luminosity larger than Lis- Different lines correspond to different 'mass-limited' group samples obtained from the 2dFGRS (see 
Table 1). The small differences at large d are due to cosmic variance. The right-hand panel shows the relation between the halo mass 
M and mean halo separation d derived from the mass function of dark matter haloes for two ACDM cosmologies with different as, as 
indicated. Throughout this paper we compute halo masses from group luminosities as follows: for a group with given Lis we determine 
the mean separation d between all groups with a group luminosity larger than Lis, and use the panel on the right to determine the halo 
mass M that corresponds to this d for the cosmology under consideration. 



redshift sample of galaxies contains about 250, 000 galaxies 
and is complete to an extinction-corrected apparent magni- 
tude of bj « 19.45 (Colless et al. 2001). When identifying 
galaxy groups, we restricted ourselves to galaxies with red- 
shifts 0.01 < 2 < 0.20 in the North Galactic Pole (NGP) 
and the South Galactic Pole (SGP) regions. Only galaxies 
with redshift quality parameter q > 3 and with redshift com- 
pleteness > 0.8 were used. This left a grand total of 151, 280 
galaxies with a sky coverage of 1124 deg . From this sample, 
YMBJ obtained a group catalogue that contains 78, 708 sys- 
tems, of which 7251 are binaries, 2343 are triplets, and 2502 
are systems with four or more members. In what follows we 
use this group catalogue to determine the halo occupation 
statistics of the 2dFGRS. 



2.2 Mock Group Catalogues 

In testing the halo-based group finder, YMBJ used a set of 
detailed mock galaxy redshift surveys (hereafter MGRSs). 
Here we use these same MGRSs for comparison with the 
2dFGRS. For the present analysis, we correct these MGRSs 
for close pair incompleteness that arises from fiber collisions 
and from the fact that nearby galaxies overlap (so that they 
are identified as a single galaxy, rather than a galaxy pair). 
The method used to correct our MGRSs for both these ef- 
fects is described in detail in van den Bosch et al. (2004b). 
Note that this close-pair incompleteness has only a minor 
impact on our results: in other words, if we were not to 
correct for these effects, it would not impact any of our 
main conclusions. In what follows, we give a brief descrip- 
tion about how these MGRSs are constructed, and we refer 



the reader to Yang et al. (2004a) and van den Bosch et 
al. (2004a) for details. 

The mock surveys are constructed by populating dark 
matter haloes in large numerical simulations with galaxies 
of different luminosities and different types. The simula- 
tions correspond to a ACDM concordance cosmology with 
Q m = 0.3, Ov = 0.7, h = H /(W0 kms^Mpc" 1 ) = 0.7 
and with a scale-invariant initial power spectrum with nor- 
malization as — 0.9, and all MGRSs discussed in this pa- 
per are therefore only valid for this particular cosmology. 
To populate the dark matter haloes with galaxies we use 
the CLF. Because of the mass resolution of the simulations 
and because of the completeness limit of the 2dFGRS we 
adopt a minimum galaxy luminosity of L mln = \Q 7 h~ 2 L 
throughout. The mean number of galaxies with L > L m i n 
that resides in a halo of mass M is given by 

/oc 
$(L|M)dL (1) 

In order to Monte-Carlo sample occupation numbers for in- 
dividual haloes one requires the full probability distribution 
P(N\M) (with N an integer) of which (N) m gives the mean, 
i.e., 

OO 

(N) M = NP{N\M) (2) 

jv=o 

We use the results of Kravtsov et al. (2004a), who has 
shown that the number of subhaloes follows a Poisson dis- 
tribution. In what follows we differentiate between satellite 
galaxies, which we associate with these dark matter sub- 
haloes, and central galaxies, which we associate with the 
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host halo. The total number of galaxies per halo is the sum 
of iV C en, the number of central galaxies which is either one 
or zero, and N sa .t , the (unlimited) number of satellite galax- 
ies. We assume that iV S at follows a Poisson distribution and 
require that iV sa t = whenever N ccn — 0. The halo occupa- 
tion distribution is thus specified as follows: if (N) m < 1 
then iV sa t = and N cen is either zero (with probability 
P = 1 - (N) m) or one (with probability P = (N) M ). If 
(N}m > 1 then iV con = 1 and iV sa t follows the Poisson dis- 
tribution 



Table 1. 'Mass-limited' group samples from the 2dFGRS 



P(AUt|M) =e" 



iVsat! 



(3) 



with fi = (iV sat ) M = (N) M ~ I- 

We follow Yang et al. (2004a) and van den Bosch et 
al. (2004b) and assume that the central galaxy is the bright- 
est galaxy in each halo. Its luminosity is drawn from <E>(L|M) 
with the restriction that L > Li with Li defined by 



JL, 



$(L|M)dL = 1. 



(4) 



The luminosities of the satellite galaxies are also drawn at 
random from $(L|M), but with the restriction L min < L < 
Li. 

Note that the resulting occupation statistics are not 
fully Poissonian. To investigate whether such a deviation 
from Poissonian can be detected from the statistics of galaxy 
groups we also, for comparison, construct a MGRS in which 
the full P(N\M) is Poissonian (not only that of the satel- 
lites), and in which all iV galaxies are drawn from the CLF 
without any restriction other than L > L m i n . In what follows 
we refer to the MGRSs based on the Li-restricted luminos- 
ity sampling as our 'fiducial' mocks, and to those with the 
unrestricted, Poissonian sampling as the 'Poisson' mocks. 

The positions and velocities of the galaxies with respect 
to the halo center-of-mass are drawn assuming that the cen- 
tral galaxy in each halo resides at rest at the center. The 
satellite galaxies follow a number density distribution that 
is identical to that of the dark matter particles, and are as- 
sumed to be in isotropic equilibrium within the dark matter 
potential. To construct MGRSs we use the same selection 
criteria and observational biases as in the 2dFGRS, making 
detailed use of the survey masks provided by the 2dFGRS 
team (Colless et al. 2001; Norberg et al. 2002). Using a set of 
independent numerical simulations, we construct 8 indepen- 
dent MGRSs which we use to address scatter due to cosmic 
variance. The MGRSs thus constructed accurately match 
the clustering properties, the apparent magnitude distribu- 
tion and the redshift distribution of the 2dFGRS, allowing 
for a direct comparison. Finally, for each MGRS we con- 
struct group samples using the same halo-based group finder 
and the same group selection criteria as for the 2dFGRS. 

Our fiducial MGRS, used throughout this paper, is 
based on the best-fit CLF listed in Table 1 (the model with 
ID A . 9 ) of van den Bosch et al. (2004b). This CLF pre- 
dicts an average mass-to-light ratio on the scale of clusters 
of (M/L) c i = 500ft (M/L) . Although in fair agreement 
with independent observational constraints (e.g., Carlberg 
ct al. 1996; Bahcall et al. 2000, but see also Tully 2003), 
we have shown that both the pairwise peculiar velocity dis- 
persions and the group multiplicity function of the 2dFGRS 
suggest a significantly higher cluster mass-to-light ratio of 
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Column (1) indicates the sample ID. The selection criteria used to 
define these samples are indicated in Columns (2) and (3), which 
list the maximum redshift z max (sample galaxies obey 0.01 < z < 
Zmax) and the minimum group luminosity Lis, mini respectively. 
Columns (4) and (5) list the number of groups in each sample, 
iVg r p, and the mean group separation, d, respectively. Finally, 
column (6) lists the corresponding minimum halo mass, M m - ln , 
obtained using the relation between d and M shown in the left- 
hand panel of Fig. 1 (assuming erg = 0.9). 



(M/L) cl = 900ft (M/L) Q (Yang et al. 2004a,b). We there- 
fore also construct a set of MGRSs based on the CLF with 
{M/L) c \ = 900ft (M/L) Q (see Yang et al. 2004b), using the 
same sampling strategy as with our fiducial mocks. Although 
the model with (M/L) i = 500ft (M/L)© is preferred by the 
observed galaxy-galaxy clustering strength, the data is not 
sufficient to rule out a cluster mass-to-light ratio as high as 
900ft (M/L) (see van den Bosch et al. 2004b). 



2.3 Ranking halo mass according to group 
luminosity 

In order to infer halo occupation statistics from our group 
samples it is crucial that we can estimate the halo masses 
associated with the groups. For individual, rich clusters one 
could in principle estimate halo masses using the kinematics 
of the member galaxies, gravitational lensing of background 
sources, or the temperature profile of the X-ray emitting 
gas. For most groups, however, no X-ray emission has been 
detected, and no lensing data is available. In addition, the 
vast majority of the groups in our sample contain only a 
few members, making a dynamical mass estimate based on 
its members extremely unreliable. We thus need to adopt a 
different approach to estimate halo masses. 

As discussed in YMBJ, for each group one can define a 
characteristic luminosity, Lis, defined as the total luminos- 
ity of all group members brighter than M bj — 5 log ft = —18. 
For groups at relatively high redshift Lis can not be mea- 
sured directly because of the apparent magnitude limit of 
the survey. For these groups we estimate Lis from the lu- 
minosity of the observable group members, using a correc- 
tion factor that is calibrated using relatively nearby groups 
(see Yang et al. (2004b,c) for details). Tests with MGRSs 
have shown that Lis is tightly correlated with the mass of 
the dark matter halo hosting the group. As shown in Yang 
et al. (2004c), ranking groups according to Lis is therefore 
similar to mass-ranking, allowing the construction of reli- 
able, 'mass-limited' group samples. In Table 1, we list the 
'mass-limited' group samples used in this paper. Each sam- 
ple is specified by two selection criteria; a lower limit on Lis, 
which as we argued above translates into a lower limit on 
halo mass, and a maximum redshift z max . The latter assures 



© 0000 RAS, MNRAS 000, 000-000 



Galaxy Occupation Statistics of Dark Matter Haloes: Observational Results 5 



that each sample is complete to some absolute magnitude 
limit, which is required for a meaningful comparison of the 
group member galaxies. 

In order to convert the Li 8 -ranking to the corresponding 
halo mass, M, we use the mean group separation, d = n~ 1//3 , 
as a mass indicator. Here n is the number density of all 
groups brighter (in terms of Lis) than the group in con- 
sideration. In the left panel of Fig. 1, we plot the mean 
relation between the group luminosity Lis and the mean 
group separation d for 2dF groups, with different lines cor- 
responding to different 'mass-limited' subsamples. Overall 
the Lis-d relation is similar for different subsamples. The 
small, but noticeable, differences reflect cosmic variance due 
to the presence of a few very large structures in the 2dF- 
GRS (see e.g., Baugh et al. 2004). Since Lis is tightly cor- 
related with halo mass, we can convert d to M. Unfortu- 
nately, this conversion requires knowledge of the halo mass 
function, and thus knowledge of the cosmological parame- 
ters. As discussed in Section 2.2, throughout this paper we 
consider a ACDM concordance cosmology with as = 0.9. To 
illustrate how sensitive the d-to-M conversion depends on 
the rather uncertain power-spectrum normalization param- 
eter, the right-hand panel of Fig. 1 plots the M-d relations 
for both as = 0.9 and as = 0.7. For haloes with masses 
M <, 10 13,5 /i _1 M , the M-d relation is virtually indepen- 
dent of a 8 . For M ^ 10 13 ' 5 fe _1 M©, however, d is smaller 
in the as = 0.9 cosmology simply because massive haloes 
are more abundant in cosmologies with larger as- Unless 
specifically stated otherwise, we use the as = 0.9 model to 
convert d to M, but we emphasize that any function of M 
can be converted back into a function of d using the rela- 
tion represented by the solid curve in the right-hand panel 
of Fig. 1. 



3 HALO OCCUPATION STATISTICS FROM 
2DFGRS GROUPS 

Having assigned 2dFGRS galaxies into groups according to 
their common dark matter haloes, we now present a detailed 
investigation of the halo occupation statistics, describing 
how galaxies with different physical properties are associ- 
ated with dark matter haloes of different mass. 

3.1 The halo occupation distribution 

The upper three panels of Fig. 2 plot the occupation num- 
bers of galaxies in 2dFGRS groups as a function of d (which, 
as discussed above, can be used as a proxy for halo mass). 
Results are shown for galaxies with ALj 7 — 5 log ft < —18.0 
(left panel), —19.0 (middle panel) and —20.0 (right panel), 
respectively. These occupation numbers are obtained using 
the summation iV = Y] 1/cj, with a the completeness in 
the 2dFGRS at the position of galaxy i, so that N is not 
necessarily an integer. The lower three panels of Fig. 2 plot 
the same occupation numbers but this time obtained from 
our fiducial MGRS, using exactly the same method as for 
the 2dFGRS. Note that in the construction of the MGRS 
we use completeness maps of the 2dFGRS. Therefore, we 
can compute exactly the same N for our mock groups (i.e., 
summation of a) as in the real 2dFGRS. Although the oc- 
cupation statistics of the groups in the MGRS reveal overall 



the same behavior as those in the 2dFGRS, there are some 
noticeable differences, which we quantify in more detail be- 
low. 

The upper panels of Fig. 3 plot the mean halo occupa- 
tion numbers, (N), as a function of d for the same samples 
of 2dFGRS groups as in Fig. 2. Using the M-d relations 
shown in Fig. 1 we convert these into the average occupa- 
tion numbers as function of halo mass shown in the bot- 
tom panels. Note that the average occupation numbers in- 
crease with halo mass, as expected. At the low mass end, 
however, they reveal a relatively flat shoulder and a sharp 
break, both at (N) ~ 1. This sharp break seems to indicate 
an almost deterministic relation between the luminosity of 
the central (brightest) galaxy in each halo and the mass 
of the halo, while the shoulder suggests that the second 
brightest galaxy is significantly fainter than the brightest 
one (e.g., Zheng et al. 2004). The dotted curves with er- 
rorbars indicate the occupation numbers obtained from the 
groups in our fiducial MGRS. These reveal an almost identi- 
cal shoulder plus break. The dashed lines, however, indicate 
the (N) m obtained directly from the CLF used to construct 
the MGRS (computed from eq. [1] with L m i n the minimum 
luminosity of the sample under consideration). The agree- 
ment of these true occupation statistics with those inferred 
from the MGRS group catalogues is remarkably good at 
(AT) J> 1, indicating that the conversion of Lis to halo mass 
via the mean separation d does not introduce any system- 
atic error. It also shows that the groups identified with the 
method developed in Yang et al. (2004b) can be used to 
accurately probe halo occupation statistics. For (N) 1, 
however, there is a noticeable discrepancy between the true 
{N)m and that obtained from the MGRS. In particular, the 
shoulder and sharp break at (N) ~ 1 visible in the mean 
occupation numbers of the mock groups are not present in 
the true (N}m- The origin of this discrepancy is easy to 
understand if one takes the stochasticity of the occupation 
numbers into account. Since we estimate halo masses from 
the Lis-ranking, halo masses are overestimated if they hap- 
pen to contain a relatively bright galaxy (compared to the 
mean). Similarly, haloes with a relatively faint galaxy com- 
pared to the mean will have their masses underestimated. If 
the average luminosity is close to the luminosity limit of the 
sample, this stochasticity in the occupation statistics causes 
a systematic deviation from the true (N}m, as the haloes 
with the relatively faint galaxies will not make the sample 
selection criteria. Caution is therefore required in interpret- 
ing the sharp break and the shoulder around N = 1 seen in 
the occupation statistics of the 2dFGRS groups. Although 
they may still be real, we cannot rule out that they are 
simply artifacts due to combined effect of stochasticity and 
magnitude limit. 

The lower panels of Fig. 3 also show that our fiducial 
MGRSs predict too many galaxies per group at the high 
mass end. Given the errorbars, which reflect the scatter 
among 8 fiducial MGRSs, these differences are very signif- 
icant. As we discuss in Section 3.2, this reflects a problem 
with the number of satellite galaxies, and suggests either a 
high mass-to-light ratio on the scale of galaxy clusters, or a 
reduction of the power-spectrum normalization as from the 
fiducial value of 0.9 to ~ 0.7. This is easy to understand: 
increasing the mass-to- light ratio on the scale of clusters, 
basically implies fewer galaxies per cluster. Since the to- 
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Figure 2. The halo occupation number distributions for groups in the 2dFGRS (upper panels) and one realization of the fiducial MGRS 
(lower panels) as a function of the group mean separation d. The left-, middle and right-hand panels correspond to different 'mass-limited' 
samples with an absolute magnitude limit of M' b = Mj, ; — 5 log h. Note that the occupation numbers are corrected for incompleteness 
effects, which explains their non-integer nature. 



tal number density of galaxies is conserved (constrained by 
the galaxy luminosity function), these galaxies now need to 
be distributed over lower mass haloes. Therefore, increasing 
(M/L) c i decreases (N) for the most massive haloes, while 
(mildly) increasing {N) for the less massive haloes. Note 
that, since less massive haloes are less strongly clustered, an 
increase of (M/L) c i lowers the overall clustering strength of 
the galaxy population. However, as shown in van den Bosch, 
Mo & Yang (2003b) and van den Bosch et al. (2004a), there 
is a sufficient amount of freedom in the data to allow us 
to modify {M/L) c \ and still get a reasonable match to the 
data. It is this freedom that we exploit here to argue for a 
high value of (M/L) c \. An alternative solution to the mis- 
match between the (N) M of MGRS and 2dFGRS is to lower 
as- Lowering as reduces the number of massive haloes, but 
changes little the clustering strength of haloes at a given 
mass (the decrease in the clustering strength of dark mat- 
ter particles is largely compensated by the increase in the 
bias factor). If the value of (M/L) c i is fixed, lowering as 
requires more galaxies to be assigned to lower-mass haloes, 
and the net effect on galaxy clustering is similar to that with 
a higher value of (M/L) c i (Yang et al. 2004a; van den Bosch 
et al. 2004b). 

In addition to the mean occupation numbers, we also 
investigate the second moment of the halo occupation distri- 
bution. This quantity is required in the modelling of the two- 
point correlation function of galaxies on small scales (e.g., 



Benson et al. 2000; Berlind et al. 2003; Yang et al. 2004a), 
and holds important information regarding the physical pro- 
cesses related to galaxy formation. In earlier investigations, 
a number of simple models were adopted to describe the 
second moment of the halo occupation distribution and its 
dependence on halo mass (e.g., Berlind & Weinberg 2002). 
With our group samples, we can actually measure this quan- 
tity directly. We present our results in terms of the ra- 
tio between the standard deviation, a(N), and the square 
root of the mean, U (N). Since for a Poisson distribution 

a(N)/ y/ (N) = 1, this ratio expresses the amount of stochas- 
ticity relative to Poisson. The solid lines in Fig. 4 show the 
results obtained from the 2dFGRS, where the three pan- 
els correspond to the same volu me- lim ited samples as in 
Figs. 2 and 3. The ratio cr(iV)/ \l (N) is close to unity in 
massive haloes, but reveals a pronounced minimum at low 
M. This suggests that the halo occupation distribution is 
(close to) Poissonian in massive haloes and significantly sub- 
Poissonian in low mass haloes. 

Note, however, that because of our method of assigning 
masses, Fig. 4 really shows the scatter in N at given Lis. 
In order to test how the scatter in the relation between M 
and Lis impacts on these results, we compare our findings 
with those obtained from a MGRS that is identical to our 
fiducial MGRS, except that this time the central galaxy is 
not treated in any special way, so that the occupation distri- 
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Figure 3. The mean halo occupation numbers as a function of mean group separation d (upper panels) and halo mass (lower panels). 
Solid lines correspond to groups in the 2dFGRS. The dotted lines with errorbars in the lower panels show the results obtained from the 
groups in our fiducial MGRSs. The errorbars are obtained from the l-cr scatter among 8 independent MGRSs. The dashed lines in the 
lower panels, labelled 'theory', indicate the true, mean occupation numbers, obtained directly from the CLF (eq. [1]) used to construct 
the MGRS. A comparison with the dotted lines shows that the halo occupation numbers are well recovered, except for groups with 
{N) < 1, where an artificial shoulder and break are introduced. The comparison between 2dFGRS and MGRS shows that our fiducial 
model predicts too many galaxies per group at the high mass end (see text for a detailed discussion). 
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Figure 4. The scatter of the halo occupation number distribution, expressed in terms of the ratio between the standard deviation, <x(7V), 
and the square root of the mean, U (N). Note that this ratio is equal to unity for a Poisson distribution. Results are shown as function of 
halo mass M for the same 'mass-limited' groups samples as in Figs 2 and 3. The solid and dashed lines correspond to the results obtained 
from the groups in the 2dFGRS and the fiducial MGRS, respectively, and are in excellent agreement with each other. The dotted lines 
correspond to a MGRS that is similar to the fiducial one, except that the luminosity of the central galaxy is not treated in a special way 
(i.e., the true occupation statistics are purely Poissonian in this case). Since halo masses are estimated from the ranking of Lis, the ratio 
deviates from unity even for this pure Poisson case. See the text for a detailed discussion regarding the interpretation of these results. 
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Figure 5. The upper panels plot the mean central galaxy luminosity, L c , as function of halo mass, M, with the errorbars indicating the 
l-cr scatter around the mean. The left and middle panel correspond to the 2dFGRS and the fiducial MGRS, respectively, where we have 
used the 'mass-limited' samples V2, V3, and V4 (see Table 1). Solid lines indicate two power-law relations, and are indicated to facilitate 
a comparison. Note the excellent agreement between the 2dFGRS and the MGRS. The upper right-hand panel indicates the same relation 
between L c and M, but this time determined directly from the populated haloes in our simulation box (i.e., without making a mock 
rcdshift surveys from which we select groups). Although the mean L c — M relation is virtually identical to that derived from the mock 
groups, the scatter is significantly larger in low mass haloes (see text for discussion). The lower panels plot the distributions P(L C \M) 
for three different bins in halo mass, as indicated. Solid, dashed, and dotted curves correspond to the 2dFGRS, the fiducial MGRS, and 
the simulation box, respectively. 



bution, P(N\M), is completely Poissonian (see Section 2.2). 
Any deviation of a(N) / (N) from unity in this MGRS is 
therefore completely artificial, allowing us to assess the ro- 
bustness of our findings. The dotted curves in Fig. 4 show 
the results obtained f rom this MGRS. They reveal a small 
minimum in a(N) / U (N) at small M, similar though less 
pronounced than for the 2dFGRS. The origin of this arte- 
fact is similar to that of the artificial shoulder and break in 
the mean occupation numbers. In haloes with (A) ~ 1 one 
expects a significant fraction of haloes with N — 0; in fact, 
for a Poissonian P(N\M) the probability to have N = Ois al- 
most 40 percent. These haloes, however, do not appear in the 
group samples causing an overestimate of {N} and an under- 
estimate of the variance. Therefore, the ratio a(N)/y // (A) 
is underestimated for haloes with (N) ~ 1. The upturn at 
the very low-mass end is due to the (artificial) sharp break 
in (N), which drives u(N)/ \J (A) up again. The presence of 
these artefacts clearly demonstrates the importance of using 
detailed MGRSs to properly interpret the data. 

The dashed lines in Fig. 4 show the results obtained 
from our fiducial MGRS. These results are in excellent agree- 



ment with those obtained from the 2dFGRS, indicating that 
the CLF and the method used for its sampling agree well 
with the data. As indicated in Section 2.2, the occupation 
statistics of the central galaxies are treated differently than 
those of the satellite galaxies: whereas P(N\M) is Poisso- 
nian for the latter, central galaxies follow a much narrower 
nearest-integral distribution. This means that P(N\M) is 
strongly sub-Poissian whenever (A) is small. As is evident 
from a comparison with Fig. 3, the minimum in a(N)/y // (A) 
occurs at a halo mass where the average occupation num- 
ber is virtually unity. This , together with the fact that the 
minimum in a(N)/ ' \J (A) is much more pronounced than 
in the pure-Poissonian MGRS, leads us to conclude that 
(i) the number of satellite galaxies above a certain lumi- 
nosity limit follow a Poissonian distribution, and (ii) the 
occupation statistics of central galaxies are sub-Poissonian, 
indicating some deterministic behavior in galaxy formation. 
Clearly, a detailed study of the higher-order moments of 
the occupation statistics can yield important constraints on 
galaxy formation, and we intend to return to this in more 
detail in a forthcoming paper. 
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Figure 6. The mean central galaxy luminosity, L c , as function 
of halo mass, M, over a large range in haloes masses. The data 
points are the same as those shown in the upper left panel of Fig 5. 
The dashed curve is the L c -M relation given by the CLF obtained 
from matching the observed luminosity function of galaxies and 
the correlation length as a function of galaxy luminosity. Note the 
existence of another characteristic mass scale, M ~ 10 11 h,- 1 M Q , 
below which L c decreases rapidly with decreasing M. This plot 
indicates that scaling relations such as the Tully-Fisher relation 
hold only over a limited range of halo masses. 

3.2 Central versus satellite galaxies 

In theoretical models of galaxy formation, galaxies in dark 
matter haloes are usually separated into central galaxies and 
satellite galaxies. Since central and satellite galaxies are ex- 
pected to have somewhat different formation histories (e.g., 
Kauffmann, White & Guiderdoni 1993), it is interesting to 
study the halo occupation distribution separately for these 
two categories of galaxies. By definition, the central galaxy 
in a halo should be the one that is located near the center of 
the host halo. Since in theory the central galaxy is expected 
to be the most massive one among all galaxies in the halo, 
we have defined the brightest galaxy in a group (halo) as 
the central galaxy, and the others as satellite galaxies. 

The upper panels of Fig. 5 plot the relation between 
the luminosity of the central galaxy, L c , and the mass of 
the host halo, M. Results are shown both for groups in the 
2dFGRS (left panel) and for those in our fiducial MGRS 
(middle panel). We also show the true relation between L c 
and M (right panel) obtained directly from the populated 
haloes in our simulation box (i.e., without making a mock 
redshift surveys from which we select groups). The mean 
Lc-M relation is remarkably similar for all three samples, 
and well described by a broken power-law with L c oc M 2 ' 3 
at M <, 10 13 ft _1 M s and L c oc M 1/4 at M £ lO^fr^M©. 
At the low-mass end, this is in excellent agreement with 
results based on galaxy-galaxy weak lensing, which imply 
that M oc L 1 /' (e.g. Yang et al. 2003a; Guzik & Seljak 
2002). At the massive end, L c only increases very slowly 
with halo mass, which is consistent with the recent result 



obtained by Lin et al. (2004), indicating that there must 
be a physical process that prevents the central galaxies in 
massive haloes from growing. One possibility is that radia- 
tive cooling of halo gas becomes negligible in massive haloes, 
with M ~ 10 13 h" 1 Mq the characteristic mass that marks 
the transition from effective to ineffective cooling (cf., Dekel 
2004). The requirement for such a transition is well known 
from semi-analytical models of galaxy formation where it 
is required to reproduce the bright end of the observed lu- 
minosity function (e.g., White & Rees 1978; Kauffmann et 
al. 1993; Benson et al. 2003; Kang et al. 2004). 

The errorbars in the upper panels of Fig. 5 indicate 
the scatter around L c at given M. Except for some small 
discrepancies at high M, the amounts of scatter in the 2dF- 
GRS and MGRS are very similar. This is illustrated more 
clearly in the lower panels of Fig. 5, which plot the actual 
distributions of L c for three bins in halo mass (as indicated) 
for both the 2dFGRS (solid lines) and the MGRS (dashed 
lines). Overall the agreement is remarkably good, providing 
strong support for the CLF and its sampling strategy. Note 
that the P(L C \M) look similar to log-normal distributions, 
with a fairly narrow width that depends only mildly on halo 
mass. 

To properly interpret these findings we compare these 
P(Lc\M) with those obtained directly from the populated 
haloes in the simulation box (i.e., without making a mock 
redshift surveys from which we select groups). As evident 
from the upper-right panel of Fig. 5, at low M the scatter in 
the true L c — M relation is much larger than in the relation 
inferred from the mock group catalogue. This is also evident 
from the lower panels in Fig. 5 which show that the true 
P(L C \M) (dotted curves) in haloes with M <> 10 13 /i _1 M 
is significantly broader than the inferred distribution. In 
particular, the inferred distribution seems to lack predomi- 
nantly the low-L c galaxies. This discrepancy arises from the 
stochasticity in the Lis — M relation. In low-mass haloes, 
where the average occupation number is close to unity, Lis 
is basically identical to L c . This means that the Lig-ranking 
becomes similar to L c -ranking, so that the resulting L c — M 
relation becomes virtually scatter free. In addition, because 
of the magnitude limit of the group sample, the haloes with 
relatively faint central galaxies are missed, causing a deficit 
of low- L c galaxies. On the other hand, some central galaxies 
may be missed in the sample because of observational se- 
lection effects, and so some of the galaxies identified as the 
central galaxies are actually the second or even the third 
brightest galaxy in a group. This introduces extra scat- 
ter in L c , which causes the scatter at the high-mass end 
to be larger for the MGRSs (and the 2dFGRS) than the 
true scatter. Therefore, as with the scatter in the occupa- 
tion numbers, great care is required when interpreting the 
scatter in P(L C \M). In particular, the log-normal character 
of P(L C \M) of the 2dFGRS galaxies does not necessarily im- 
ply that the true distribution is log-normal. We emphasize, 
however, that despite this bias, the comparison between the 
2dFGRS and the MGRS is still meaningful. In particular, 
the good agreement between both group catalogues suggests 
that our method of assigning galaxies to dark matter haloes 
used in the construction of the MGRS (see Section 2.2) is in 
excellent agreement with the 2dFGRS. 

Since our group sample becomes quite incomplete for 
halos with M <, 10 12 /i" 1 M Q , we cannot use our groups to 
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study the L c -M relation for low-mass haloes. However, our 
CLF, which is constrained by the abundances and cluster- 
ing properties of the galaxy population, does contain such 
information. The dashed curve in Fig. 6 shows the L c - 
M relation obtained from our CLF down to haloes with 
M — lO 10 /?.^ 1 Mq. For comparison, we also plot the results 
obtained from our 2dFGRS group catalogue, which are in 
excellent agreement with these predictions. For haloes with 
M 1O 12 /i _1 M0, however, no reliable determination of 
L C (M) can be obtained from the groups. This is unfortu- 
nate as our CLF model predicts the presence of a second 
characteristic mass scale at M ~ 10 11 /i _1 M e . For haloes 
below this scale L c is predicted to decrease rapidly with de- 
creasing M (roughly as L c oc JVf 3 , with /3 ~ 2 - 4). This is 
required in order to match the relatively steep slope of the 
halo mass function at the low mass end with the relatively 
shallow faint-end slope of the galaxy luminosity function (see 
e.g., Yang et al. 2003b), and is often interpreted as 'evidence' 
for a suppression of star formation by feedback effects (e.g. 
Dekel & Silk 1986; Dekel 2004). Note that the L c -M relation 
shown in Fig. 6 suggests that scaling relations such as the 
Tully-Fisher relation can hold only over a limited range of 
halo masses. 

Let us now move on to satellite galaxies. Fig. 7 plots the 
distribution of the number of satellite galaxies in groups, N B , 
for a number of different mass bins. The thick solid curves 
indicate Poisson distributions with the same (JV a ), and fit the 
iV s -distributions extremely well. This is an important result, 
because it suggests a direct link between satellite galaxies 
and dark matter subhaloes. In a recent study, Kravtsov et 
al. (2004), using large numerical simulations, have shown 
that the occupation distribution of dark matter subhaloes 
follows a Poisson distribution, in excellent agreement with 
the occupation statistics of the satellite galaxies shown here. 

For comparison, the dotted lines in Fig. 7 plot the distri- 
butions of iV s obtained from our fiducial MGRS. Unlike with 
the central galaxies, for which MGRS and 2dFGRS are in ex- 
cellent agreement, the MGRS contains far too many satellite 
galaxies in massive systems. This explains the discrepancy 
in the average occupation numbers (N) at large M shown 
in Fig. 3, and is consistent with the findings in some of our 
previous studies (Yang et al. 2004a; YMBJ; van den Bosch 
et al. 2004b) . As discussed in Yang et al. (2004a) , there are 
two different ways to reduce the number of rich systems. 
One is to increase the mass-to-light ratio of clusters, so that 
the number of galaxies assigned to a massive halo is reduced. 
The other is to reduce the value of as, so that the number 
of massive haloes that can host a large number of satellite 
galaxies is reduced. As an illustration, the dashed lines in 
Fig. 7 show the results obtained from a MGRS based on the 
CLF model with (M/L) c \ = 900/i (M/L) (see Section 2.2). 
This model matches the 2dFGRS distributions much bet- 
ter. Similar results are obtained if we reduce the value of 
as to ~ 0.70 (not shown). Note that these two models are 
also favored by several other observations based on the 2dF- 
GRS, such as the redshift-space clustering of galaxies (Yang 
et al. 2004a) and the multiplicity function of galaxy groups 
(Yang et al. 2004b). 

Before concluding that therefore the satellite occupa- 
tion statistics hint towards either a high (M/L) c \ or a low 
as, it is important to check whether the Lis to M conver- 
sion (via the mean separation d) used, has not introduced 



any artifact in this statistic. To test this we determine the 
distribution of iV s directly from the populated haloes in the 
simulation box (i.e., without making a mock redshift surveys 
from which we select groups). Since we know the halo mass 
exactly for each halo in the box, we can compute the number 
of satellite galaxies above the magnitude limit listed. The re- 
sulting distributions of N B are shown in Fig. 7 as thin solid 
lines and are in reasonable agreement with the distributions 
obtained from the corresponding MGRS (dotted lines). Al- 
though small differences are apparent, the overall trends, 
especially the dramatic overprediction of the mean satellite 
number, is nicely reproduced. This demonstrates that the 
discrepancy between 2dFGRS and MGRS is real, indicat- 
ing that either cluster mass-to-light ratios are high or that 
o s ~ 0.7. 



3.3 Dependence on galaxy type 

Madgwick et al. (2002) used a principal component analysis 
of galaxy spectra taken from the 2dFGRS to obtain a spec- 
tral classification scheme. They introduced the parameter 
rj, a linear combination of the two most significant princi- 
pal components, as a galaxy type classification measure. As 
shown by Madgwick et al. (2002) , rj follows a bimodal distri- 
bution and can be interpreted as a measure for the current 
star formation rate in each galaxy. Furthermore rj is well cor- 
related with morphological type (Madgwick 2002). In what 
follows we adopt the classification suggested by Madgwick et 
al. and classify galaxies with rj < —1.4 as 'early- types' and 
galaxies with rj > —1.4 as 'late-types'. Each galaxy in our 
MGRS is assigned a type (early or late) , using the method 
described in Yang et al. (2004a) . 

As shown in van den Bosch et al. (2003a), the observed 
correlation lengths of early and late type galaxies in the 
2dFGRS indicate that the former are preferentially hosted 
by massive haloes. However, these data alone do not con- 
tain sufficient information to accurately constrain the seg- 
regation of the galaxy population in early and late types. 
Here we use the galaxy groups selected from the 2dFGRS 
to directly constrain the occupation statistics of both pop- 
ulations. Fig. 8 plots the fraction of early-type galaxies as a 
function of halo mass. Results are shown separately for all 
(central plus satellite) galaxies (left-hand panel) and for cen- 
tral galaxies only (right-hand panel). The solid, dotted and 
dashed lines correspond to the 'mass-limited' group samples 
V2, V3 and V4, respectively (see Table 1). Among the total 
population, the fraction of early-type galaxies increases from 
about 25% in haloes with M ~ W 12 h' 1 M Q to about 80% 
in haloes with M ~ lO 15 /i _1 M0. Among the central galaxy 
population, the increase of the fraction of early types with 
mass is stronger: in haloes with M 1O 14 /i -1 M0 virtually 
all central galaxies are early types. As a comparison, the 
open circles with errorbars in Fig. 8 show the results ob- 
tained from the 'mass-limited' group samples (V2 in Table 
1) constructed from the fiducial MGRSs. The model predic- 
tions agree reasonably well with the observational results, 
indicating that our model for splitting galaxies in early- and 
late-types (see van den Bosch et al. 2003a) is sufficiently 
accurate. 
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Figure 7. Distributions of the number of satellite galaxies in groups for different bins in halo mass, as indicated. Panels in the upper, 
middle and lower rows correspond to different absolute magnitude limits as indicated, where M' b ^ = M(, ; — 51ogh. The hatched 
histograms indicate the distributions obtained from the groups in the 2dFGRS. Thick solid curves correspond to Poisson distributions 
with the same mean N s , and are shown to illustrate the Poissonian nature of P(N S \M). The dotted and dashed histograms indicate 
the distributions obtained from the fiducial MGRS and the MGRS with (M/L) c \ = 900h (M/L)q, respectively. Whereas the former 
dramatically overestimates the average number of satellite galaxies in massive haloes, the latter fits the 2dFGRS results extremely 
well. The thin, solid lines, arc the number distributions of satellite galaxies obtained directly from the populated haloes in our fiducial 
simulation box (i.e., without making a mock redshift surveys from which we select groups). These therefore reflect the true P(N a \M). 
Note the good, overall agreement with the distributions obtained from the fiducial MGRS (dotted curves). 

4 THE CONDITIONAL LUMINOSITY ditional luminosity function, $(L|M), which specifies the 

FUNCTION number of galaxies in haloes as a function of luminosity. 

Thus far our discussion has only focused on the occupation 
number of galaxy groups (dark matter haloes). We now use 
the groups in the 2dFGRS to directly determine the con- 
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Figure 8. Left-hand panel: The fraction of early-type galaxies in groups of the 2dFGRS as function of halo mass. Solid, dotted and 
dashed lines correspond to the 'mass-limited' group samples V2, V3, and V4 with different absolute magnitude limits, respectively (see 
Table 1). As a comparison, the results for groups in the fiducial MGRSs with Mj J — 51og/i < —18.0 are shown as circles with errorbars 
(1-<T scatter among 8 fiducial MGRSs). Right-hand panel: Same as left-hand panel, except that now only central galaxies are considered. 



4.1 Direct measurement of the CLF from 
2dFGRS groups 

The left-hand panels of Fig. 9 show the conditional luminos- 
ity functions (CLFs) of galaxies for 2dFGRS groups of differ- 
ent masses. For comparison, the contributions from satellite 
galaxies are shown separately. These CLFs have been ob- 
tained directly by counting galaxies in groups. For a given 
galaxy luminosity L, there is a limiting redshift, zl, beyond 
which galaxies with such a luminosity are not included in 
the sample. In order to estimate the CLF, 3>(L|M), at given 
L and M we only use groups with mass M that are within 
the redshift limit zl- The errorbars shown correspond to 1- 
a fluctuations among 8 independent MGRSs and reflect the 
expected errors due to cosmic variance. To test the reliabil- 
ity of the measurements, we compare in the middle panels 
the model input CLFs (solid curves) with those obtained 
from the mock group samples. The CLFs recovered from 
the groups in the MGRS agree well with the model input 
down to halo masses of M ~ 10 13 ' 3 Ii~ 1 Mq. For less massive 
haloes, however, there is a significant discrepancy. In partic- 
ular, the CLFs determined from the groups seem to predict 
too few faint satellite galaxies, and too many bright central 
galaxies. There are two potential sources for these discrepan- 
cies: (i) the inaccuracy of our group finder for poor systems; 
(ii) the error in the Lig to M conversion. In order to test 
these possibilities, the right-hand panels of Fig. 9 plot the 
CLFs for MGRS groups binned according to true halo mass 
instead of the Lis ranking. This solves the problem at the 
bright end, suggesting that this particular discrepancy owes 
to errors in the Lis-M conversion, but still results in too 
few faint galaxies. This reflects the incompleteness of our 
group finder (see Yang et al. 2004b). Note that, in low-mass 
haloes where Lis is dominated by the central galaxy, the 
Lis-M conversion produces an artificial peak in the CLF 



at the bright end (see the bottom middle panel). Such a 
peak is also seen in the observational data (the bottom left 
panel), but our tests show that it is doubtful that this is a 
real feature of the CLF. 

These results are interesting in light of the recent find- 
ings by Zheng et al. (2004), who computed the conditional 
baryonic mass functions (hereafter CMF) of galaxies in the 
semi-analytical models of Cole et al. (2000). In haloes with 
12.5 £ \og[M/ M Q ] <; 14.0 these CMFs reveal a clear peak 
due to the central galaxies. Before we can claim that this 
is inconsistent with our results presented above, we need to 
show that if a peak is present in the CLF, our analysis is 
able to recover it. To test this, we construct eight MGRSs 
based on CLFs that contain artificial peaks at the bright 
end, similar to the CMFs in Zheng et al. (2004). The in- 
put CLFs are shown as the solid curves in Fig. 10. We then 
apply all the observational selections to these MGRSs and 
select groups using exactly the same method as that for the 
real groups. The recovered CLFs in different mass ranges 
are shown in Fig. 10 as the histograms (the errorbars are 1- 
<j scatter among the eight MGRSs). Comparing these with 
the model input, we see that our analysis is able to recover 
prominent peaks in the CLFs, although weak and sharp fea- 
tures may be smeared out. Thus, if the CLF for haloes with 
M <, 10 13 ' 5 ft _1 M indeed contained peaks as prominent 
as those predicted by the semi-analytical model mentioned 
above, our analysis would have easily revealed them. There- 
fore we conclude that the true CLF, as extracted from the 
2dFGRS, does not reveal any prominent peaks. This sug- 
gests a problem for the semi-analytical models of Cole et 
al. (2000), although we caution that any disagreement be- 
tween the CLF (based on luminosity in the photometric bj- 
band) and the CMF (based on baryonic mass) should be 
interpreted with extreme care. 
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Figure 9. The conditional luminosity functions for groups in different mass bins, as indicated. Results are shown separately for all (central 
plus satellite) galaxies (the broader histogram) and satellite galaxies (the narrower histogram), respectively. The left-hand panels are for 
2dF groups, while the middle panels are for groups in the fiducial MGRSs. In all these cases, halo masses are based on the rank-ordering 
of the group Lig luminosities. To test the impact of the error in the Lig-M conversion, we show in the right panels the CLFs obtained 
from the fiducial MGRSs with groups binned according to their true halo masses. Errorbars in all panels are obtained from the 1-cr 
scatter among the 8 fiducial MGRSs. As a comparison, the solid curves indicate the input CLFs (which are of Schechter form) used to 
construct the fiducial MGRSs. 
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Figure 10. The conditional luminosity functions for groups in different mass bins, as indicated. The solid curves show the input CLFs, 
which contain peaks at the bright end to mimic the conditional baryonic mass functions found by Zheng et al. (2004). The histograms 
are the CLFs recovered from the groups selected from the MGRSs constructed with these input CLFs. Errorbars in all panels are based 
on the 1-cr scatter among 8 MGRSs. 



4.2 Comparison with CLF Models 

Having measured the CLFs from the groups in the 2dF- 
GRS, we now turn to a comparison with the results obtained 
from the MGRSs and with the actual input CLFs used 
to construct them. Recall that the input CLFs are based 
on matching the observed luminosity function of galaxies 
and the luminosity-dependence of the correlation length of 
galaxies. Comparing the 2dF results shown in the left-hand 
panels of Fig. 9 with the results obtained from the fiducial 
MGRSs (the middle panels), we see that the observed CLFs 



have shapes that are similar to our model predictions, but 
with a lower amplitudes. This is simply another reflection 
of the discrepancy between the 2dFGRS and our fiducial 
CLF model regarding the abundances of satellite galaxies 
(see Section 3.2 and also van den Bosch et al. 2004b). Simi- 
lar discrepancies have previously been noticed from the pair- 
wise peculiar velocity dispersions (Yang et al. 2004a) and the 
multiplicity function of galaxy groups (Yang et al. 2004b). 
As shown in these studies, these discrepancies indicate either 
a relatively high mass-to-light ratio on cluster scales, or a rel- 
atively low value of as. To test how the corresponding CLF 
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Figure 11. The conditional luminosity functions for 2dF groups in different mass bins (histograms), as indicated. Results arc shown 
separately for late-type galaxies (left panels), early- type galaxies (middle panels) and all (early- and late-type) galaxies (right panels). 
Errorbars in all panels are obtained from the 1-cr scatter among 8 MGRSs. The solid curves indicate the input CLFs used to construct 
the MGRSs with (M/L) c [ = 900/i. The dotted lines are the CLFs recovered from the groups in these MGRSs, with halo masses estimated 
from the Lig ranking. 
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models compare to the CLF derived directly from the groups 
in the 2dFGRS, the dotted lines in Fig. 11 indicate the CLFs 
obtained from the MGRSs with {M/L) a. ~ 900/i (M/L)©. 
Results are shown separately for early-type, late-type, and 
all galaxies. Clearly, this model is in much better agreement 
with the 2dFGRS than the fiducial model. Note that the 
model also very nicely reproduces the CLFs of the early- 
and late-type galaxies separately. As for our fiducial mocks, 
the input CLFs (solid smooth curves) are well reproduced 
by the MGRSs for haloes more massive than 10 13 - 3 r'MQ, 
while for less massive haloes a small peak arises and the 
groups become incomplete at the faint end. 

Unfortunately, we do not have a set of numerical sim- 
ulations for a ACDM cosmology with as = 0.7, so that we 
can not construct corresponding MGRSs (but see Yang et 
al. 2004a for an approximate method). However, numerous 
tests discussed in our previous work suggests that this model 
is virtually indistinguishable from that with ug = 0.9 and 
(M/L) c i ~ 900ft (M/L) . We therefore conclude that the 
CLFs determined directly from the groups in 2dFGRS pro- 
vide further support for both models as viable descriptions 
of the galaxy-dark matter connection. 



5 CONCLUSIONS 

Using the galaxy group catalogue constructed from the 2dF- 
GRS by Yang et al. (2004b), we have investigated various 
aspects regarding the halo occupation statistics of galaxies 
in the 2dFGRS. This is the first time the halo occupation 
distribution in real galaxy systems is examined in such de- 
tail, and has resulted in a number of interesting results that 
shed light on the connection between galaxies and dark mat- 
ter haloes. 

In order to estimate halo masses associated with the 
galaxy groups, we have ranked groups according to their 
group luminosity. Under the ansatz that the group lumi- 
nosity is tightly correlated with halo mass, this is similar 
to mass ranking, and one can use the mean separation be- 
tween the groups above a given ranking to determine the 
corresponding halo mass. Since any stochasticity in the re- 
lation between group luminosity and halo mass causes er- 
rors in the derived group masses, it is essential to use mock 
galaxy redshift surveys (MGRSs) to properly interpret the 
results. In this study we used MGRSs constructed using the 
conditional luminosity function (CLF) which has been con- 
strained by the abundance and clustering properties of the 
galaxies in the 2dFGRS. 

The first statistic we have investigated is the mean 
occupation number of galaxies above a given luminosity 
limit. Using the MGRS we have shown that this statistic 
can be determined from the galaxy groups extremely reli- 
ably, except for low mass haloes where (N) ~ 1. Here the 
stochasticity in the occupation statistics causes a system- 
atic error that mimics a flat shoulder and a sharp break in 
the derived {N)m- Yet, the comparison between 2dFGRS 
and MGRS, both of which suffer from the same system- 
atic, is meaningful and allows one to test whether the oc- 
cupation statistics used to construct the MGRS (i.e., the 
CLF and its sampling strategy) are in agreement with the 
data. In terms of the mean occupation numbers, we find 
that our fiducial MGRS overestimates (N) in high mass 



haloes with respect to the 2dFGRS. This overabundance 
of satellite galaxies in massive haloes was previously noticed 
in van den Bosch et al. (2004b) and Yang et al. (2004b), 
and indicates that either clusters have an average mass- 
to- light ratio of (M/L) cl ~ 900ft (M/L)© (compared to 
(M/L) cl = 500ft (M/L)© in our fiducial model), or that the 
power-spectrum normalization is relatively low; as ~ 0.7 
rather than 0.9. Similar conclusions were reached by Yang et 
al. (2004a) from a detailed analysis of the pairwise-peculiar 
velocity dispersions of galaxies in the 2dFGRS. 

In addition to the mean, we have also investigated 
the scat ter in the occupation statistics, using the ratio 
cr(N)/y/ (N). In massive haloes we find this ratio to be 
close to unity, indicating that the occupation distribution 
P(N\M) is close to Poissonian. In low mass, haloes, how- 
ever, there is a pronounced minimum indicating a P(N\M) 
that is significantly narrower than a Poisson distribution. 
We have shown that these findings are in excellent agree- 
ment with our fiducial MGRSs, but only if we sample the 
luminosity of the brightest galaxy in each halo in a some- 
what deterministic way. Without such special treatment, 
i.e., when drawin g all l uminosities at random form the CLF, 
the ratio a(N) / \J (N) is no longer in agreement with that of 
the 2dFGRS. These findings suggest that (i) the occupation 
statistics of central galaxies are sub-Poissonian, indicating 
some deterministic behavior in galaxy formation, and (ii) 
the number of satellite galaxies above a certain luminosity 
limit follows a Poissonian distribution. This is in excellent 
agreement with a scenario in which satellite galaxies are as- 
sociated with dark matter subhaloes, which, as shown by 
Kravtsov et al. (2004), also follow Poissonian occupation 
statistics. 

The mean luminosity of the central galaxies, L c , is 
found to scale with halo mass as L c oc M 2/3 for haloes 
with masses M < lO 13 /!" 1 M Q , and as L c oc M 1 / 4 for more 
massive haloes. At the low-mass end, this is in excellent 
agreement with results based on galaxy-galaxy weak lensing, 
which imply that M oc L c 5 (e.g. Yang et al. 2003a; Guzik & 
Seljak 2002). The characteristic break at M ~ 10 13 ft _1 M© 
indicates the existence of a characteristic scale in galaxy for- 
mation, thought to be associated with the transition from ef- 
fective to ineffective cooling (e.g., White & Rees 1978; Dekel 
2004). Although not directly revealed by our galaxy groups, 
another characteristic mass, M ~ lO 11 /! -1 M©, can be in- 
ferred from our CLFs obtained from the 2dFGRS. Below 
this mass, star formation efficiency decreases rapidly with 
decreasing halo mass, presumably due to feedback from su- 
pernovae. We have also investigated the full distribution 
of central luminosity; P(L C \M). Although it appears log- 
normal, detailed tests show that the group luminosity rank- 
ing used to estimate halo masses causes systematic errors in 
P{L C \M) (though the mean is not affected). The compari- 
son with the MGRS, however, is still meaningful and shows 
excellent agreement, providing further support for the CLF 
and its sampling strategy. 

In addition to a split in central and satellite galaxies, 
we have also divided the population in early- and late-type 
galaxies. The central galaxies in low-mass haloes are found 
to be predominantly late type galaxies, while those in mas- 
sive haloes are almost entirely early types. This is in good 
agreement with the occupation statistics obtained from an 
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analysis of the clustering properties of early- and late-type 
galaxies (van den Bosch et al. 2003a). 

Using the 2dF groups, we have also measured the con- 
ditional luminosity function directly. Although the CLF of 
central galaxies is fairly narrow, the presence of central 
galaxies does not show up as a strong peak at the bright 
end of the total CLF. In fact, over the entire halo mass 
range that can be reliably probed with the present data 
(from ~ 10 13 - 3 /i _1 M to ~ lO 14 ' 7 ^ 1 M ), the CLF is well 
fit by a Schechter function. This supports the assumption 
regarding the shape of the CLF made in our previous work, 
but disagrees with the conditional baryonic mass function 
(CMF) in semi-analytical models of galaxy formation. As 
shown by Zheng et al. (2004) , the latter reveals a pronounced 
peak due to the central galaxies. We caution, however, that 
any disagreement between the CLF (based on luminosity in 
the photometric foj-band) and the CMF (based on baryonic 
mass) should be interpreted with care. 

The CLFs obtained from the galaxy groups in the 2dF- 
GRS are in good agreement with the CLF model based on 
matching the observed luminosity function and large-scale 
clustering properties of galaxies in the ACDM concordance 
cosmology. It indicates that this model provides an accu- 
rate description of the connection between galaxies and dark 
matter haloes, with the condition that either <Js ~ 0.7, or, 
if ug = 0.9, that clusters have a mass-to-light ratio that is 
significantly higher than typically found. Finally we point 
out that, with the completion of the SDSS, the analysis pre- 
sented here can be naturally extended to include a wider 
variety of intrinsic properties of individual galaxies (in addi- 
tion to luminosity and type), to investigate the occupation 
statistics as function of color, surface brightness, AGN ac- 
tivity, etc. The results from such analyzes will provide un- 
precedented constraints on how galaxies of different physical 
properties form in dark matter haloes of different masses. 
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